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=t— Mode: Text; Package: CMl; Base: 10 —*— 


This describes every instructions that so far has been planned 
to implement using Sprint and Weitek chip. The following are 
listed for every instruction: what kind of instruction, 
addressing modes, conditionalization on or off, the aloge ithm 
for the instruction according to which chip is being used, 

the ucode space and the performance. 


Exception handling is listed at the end of this file, 


due 
to the fact that a lot of it is common to most 


routines. 
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Instructions: F+,F—-,F* 
Precision: single 
Addr—Mode: two 
Condition: cond/always 


Algorithm on WTL—3132: 


A op B => A 32 bit :conditional or always 
#cycles MB FB 
32 BO->Td free 
32 AG@->Ta Td—>Reg 
32 5 Se STC! Ta op Reg—>Reg 
52 | A1—>Tb Reg->Tc or Ta->Te 
32 | Tc—>A@ Td—>Reg 
32 | B2->Tc Tb op Reg—>Reg 
32 | A2->Ta Reg—>Td or Tb->Td 
32. +———_Td->A1 Te—>Reg 
Ucode Space: Common routine: 291 
F+: US 
Se 1S 
Fes ihe 
Performance: NO-VPs VPs 
165 cycles 180 cycles per VP 


+ 65 cycles for overhead 
Exception Detection Modes: 


Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: F+,F—-,Fe 
Precision: double 
Addr—Mode: two 
Condition: cond/always 


Algorithm on WTL-3132: 


Ucode Space: Common routine: 
F+: 
F-: 
Fe: 
Performance: NO—VPs VPs 


Exception Detection Modes: 


Digitized by the Internet Archive 
In 2023 with funding from 
Kahle/Austin Foundation 


https://archive.org/details/alexfpalgorithmsOOunse 
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Instructions: F+,F—-,F*,F/ 
Frecisions double 
Addr—Mode: two 
Condition: cond 


Algorithm on WTL-3164: 


AopB-—->A 64 bit :conditional 
#cycles MB FB 
S2 B@-|s-—>Ta free 
32 Be—ms—>Tb Ta->Reg—l|s 
SZ 1 = AG=| S16 Tb->Reg-—ms 
32 | A@—ms—>Td free 
Gp | free Tce,Td op Reg->Reg 
SY | free Tc, Td op Reg—>Reg 
SZ | Bi-|s—>Td Reg—|s->Ta or Tc->Ta 
o2 | Tao—>A@-Is Td—->Reg—-Is 
o2 | Bi-—ms—>Ta Reg-ms->Tb or Td->Tb 
32 AG ns Ta—>Reg-ms 
Ucode Space: Common routine: 325 
F+: 13 
Es ke! 
Fae: 1s) 
eye 25 
Performance: NO-VP s VPs 
325 cycles 260 cycles per VP 


+ 65 cycles for overhead 
yee additional 448 cycles 


Exception Detection Modes: 

Please refer to the section on exception handling at the 
enc of this document. The cost for each mode according 
to every function is described there. 
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Instructions: F+,F—-,F*,F/ 


Precision: double 
Addr—Mode: two 
Condition: always 


Algorithm on WTL-3164: 


AopB->A 64 bit :always 
#cycles MB FB 
32 B@-|s->Ta free 
32 B@—ms—>Tb Ta->Reg-ls 
LY AG@—-|s—>Tc Tb-—>Reg—ms 
32 A®—ms—>Td free 
352 Si Tce,Td op Reg—>Reg 
52 | Bi-ms—>Tb Tce,Td op Reg—>Reg 
32 | A1i—|s—>Tc Reg—|s—>Td 
oy | Td—->AQ-Is Ta->Reg—-Is 
32 | A1i-ms-—>Td Reg-ms—>Ta 
32 +——— Ta—>A@-ms Tb—>Reg-ms 
Ucode Space: Common routine: 325 
Fs 13 
2 1 
Fee tS 
Fife 25 
Performance: NO-VP s VPs 
325 cycles 197 cycles per VP 


+128 cycles for overhead 
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page 4 


F/< additional 448 cycles 


Exception Detection Modes: 


Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: F+,F—-,Fe,F/ 


Precision: single 
Addr—Mode: two 
Condition: cond/always 


Algorithm on WTL-3164: 


A op B->A 32 bit :conditional or always 
#cycles MB FB 
OZ BQ->Td free 
oe A®—>Ta Td—>Reg-—ms 
OZ eee > al LC] Ta op Reg-ms—>Reg-ms 
S2 | A1—>Tb Reg-ms—>Tc or Ta->Tec 
BP | Tc—>A@ Td—>Reg-ms 
S2 | B2—>Te Tb op Reg—-ms—>Reg-ms 
Sy | A2-—>Ta Reg—>Td or Tb->Td 
a2 +———-Td—>A1 Tc—>Reg-—ms 
Ucode Space: Common routine: 291 

Fae ites 

Jae ie 

ae 1) 

le} é 25 
Performance: NO-VP s VPs 

165 cycles 10@ cycles per VP 


+ 65 cycles for overhead 
aye additional 192 cycles 


Exception Detection Modes: 

Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: F+,F—-,Fe 
Precision: single 
Addr—Mode: three 
Condition: cond 


Algorithm on WTL-3132: 


AopB->C SZ) Dilcesrcondil fiona) 
#cycles MB FB 
32 BO->Ta free 
32 A@-—>Tb Ta->Reg 
OZ. GO i Tb op Reg-—>Reg 
32 | Bi->Td Reg—>Tc or Ta->Te 
a2 | e=>Ce Td->Reg 
32 +—— Ai-—>ib free 
Ucode Space: Common routine: 200 
F+: ee! 
F=: he) 
Res 1 
Performance: NO—VPs VPs 
165 cycles 132 cycles per VP 


+ 65 cycles for overhead 
Exception Detection Modes: 


Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: F+,F—-,Fe 
Precision: double 
Addr—Mode: three 
Condition: cond 


Algorithm on WIL-3132: 


I (yey 1s} >> te 64 bit :conditional 
#cycles MB FB 
Ucode Space: Common routine: 

rare 

f= 

Fe: 
Performance: NO-VPs VPs 


Exception Detection Modes: 

Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: F+,F—,F*,F/ 
Precision: double 
Addr—Mode: three 
Condition: cond 


Algorithm on WTL-3164: 


It, ja) dey Epo 16. 64 :conditional 
#cycles MB FB 
6 P2 B@-|s-—>Ta free 
o2 B@-—ms-—>Tb Ta—>Reg—-Is 
oP A@—|s—>Tc Tb->Reg—ms 
52 1 >A0—-ms—>id free 
32 | C@-—|s-—>Ta Te,Td op Reg—->Reg 
oe | C@—ms—>Tb Tce,Td op Reg—->Reg 
32 | Bi-|s—>Td Reg—Is—>Tc or Ta->Tc 
32 | Tc—>C@-Is Td->Reg-ls 
32 | B1—ms—>Ta Reg-—ms—>Td or Tb—>Td 
32 | Td—>C0-ms Ta-—>Reg—ms 
ey = NS Sifts free 
Ucode Space: Common routine: 353 

F+: 13 

F—: ee 

Fe: iS 

Byes PAS) 
Performance: NO-VPs VPs 

525 cycles 260 cycles per VP 

+ 65 cycles for overhead 

PHS odditional 448 cycles 


Exception Detection Modes: 

Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: F+,F—,F*,F/ 


Precision: single 
Addr—Mode: three 
Condition: cond 


Algorithm on WTL-3164: 


A op, B => C 32 bit :conditional 
#cycles MB FB 
32 B@->Ta free 
32 A@->Tb Ta->Reg 
SY IQUE Se Tb op Reg->Reg 
S2 | Bi->Td Reg->Tc or Ta->Te 
2 | Te->C® Td—->Reg 
32 See ile free 
Ucode Space: Common routine: 200 
F+: ee! 
F=; We) 
Fu: Wi 
E/- 25 
Performance: NO—VPs VPs 
165 cycles 132 cycles per VP 


+ 65 cycles for overhead 
fa/fa additional 192 cycles 


Exception Detection Modes: 
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Please refer to the section on exception handling at the 


end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: F-+,\F=,.R* 
Precision: single 
Addr—Mode: three 
Condition: always 


Algorithm on WTL—-3132: 


Avop 5: —> '€ 32 bit :always 
#cycles MB FB 
SZ Be->Td free 
32 A@—>Ta Td—>Reg 
32 pe eile aC Ta op Reg->Reg 
32 | A1—>Tb Reg->Te or Ta—->Tc 
32 | Tc—>C@ Td—>Reg 
32 | B2—>Tc Tb op Reg->Reg 
32 | A2-—>Ta Reg->Td or Tb->Td 
32 af Td->C1 Te—>Reg 
Ucode Space: Common routine: 291 
ape 13 
F-: 13 
Fx: 13 
Performance: NO—VPs VPs 
165 cycles 108 cycles per VP 


+ 65 cycles for overhead 
Exception Detection Modes: 


Pleose refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: F+,F—-,Fe 

Precision: double 

Addr—Mode: three . 
Condition: always 


Algorithm on WTL-3132: 


A op B +> C 64 bit :always 
#cycles MB FB 
Ucode Space: Common routine: 

Fats 

P=: 

Fu: 
Performance: NO—VP s VPs 


Exception Detection Modes: 

Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to ever, function is described there. 
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Instructions: Fe ee 


Precision: double 
Addr—Mode: three 
Condition: always 


Algorithm on WTL-3164: 


A op B->C 64 :always 
#cycles MB FB 
Sh B@-|s—>Ta free 
32 B@-—ms—>Tb Ta—>Req-I|s 
S2 AGS iS—>iC Tb->Reg—ms 
52 A®—ms—>Td free 
OZ ee SG Tc,Td op Reg->Reg 
52 | Bi-—ms—>Tb Tc,Td op Reg—>Reg 
32 | Ai-—|ls—>Tec Reg—|s—>Td 
2 | Td—>C@-Is Ta—>Reg-Is 
32 | A1—ms—>Td Reg-ms->Ta 
OZ +—— Ta->C0-ms Tb->Reg—ms 
Ucode Space: Common routine: 325 
Pars ite 
Fe VS 
ers VS 
Bye YAS) 
Performance: NO—VPs VPs 
S29) Gy.cilie's 197 cycles per VP 
+128 cycles for overhead 
lye additional 448 cycles 


Exception Detection Modes: 

Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 


SHREK HK ERKR KERR ERE EERE EK EEK EERE RE ER EKER EKER EK EKER AE EERE RK HE RK HH 


Instructions: F+,F—-,F*,F/ 


Precision: single 
Addr—Mode: three 
Condition: always 


Algorithm on WTL-3164: 


Is (he) a’ ==> (C 32 bit :always 
#cycles MB FB 
32 B@->Td free 
32 AG >Ta Td->Reg 
52 fel el To op Reg—>Reg 
BY | A1i-—>Tb Reg->Te or Ta—>Te 
S2 | Tc—>C@ Td—>Reg 
32 | B2—>Tc Tb op Reg—>Reg 
52 | A2->Ta Req—>id soln ib—>1d 
32 sa = 1 CRC) Tc—>Reg 
Ucode Space: Common routine: 291 
Pars 1) 
m9 Ve 
Lae 13 
Bye 28 
Performance: NO-VPs VPs 
165 cycles 1@@ cycles per VP 


+ 65 cycles for overhead 


ae additional 192 cycles 
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Exception Detect on Modes: 


Please refer to ihe section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: F+constant,F-—constant,F*constant 
Precision: single 

Addr—Mode: one 

Condition: cond/always 


Algorithm on WIL—3132: 


A op Const-—> A 32 bit :conditional! or :always 


#cycles MB FB 
Z Const—>Bypass free 
1 free Bypass—>Reg[ 31] 
S2 A@—>Ta free 
62 free Ta op Reg[31]-—>Reg 
32 5 LS Reg->Te or Ta-—>Te 
1 free Bypass—>Reg[31 } 
a2 Tc—>A@ Tb op Reg[31]—>Ree 
a2 | A2—>Ta Reg->Td or Tb->Td 
1 | free Bypass—>Reg[31] 
SZ +——._ Td->A1 Ta op Reg[31]—>Reg 
Ucode Space: Common routine: 200 

F+constant: he 

F—constant: ee) 

F*constant: Ue 
Performance: NO—-VPs VPs 

135 cycles 67 cycles per VP 


+ 68 cycles for overhead 
Exception Detection Modes: 
Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: F+constant,F-constant,F*constant 
Precision: double 

Addr—Mode: one 

Condition: cond/always 


Algorithm on WTL-3132: 
A op Const-—> A 64 bit :always 


#cycles MB FB 
Ucode Space: Common routine: 
F+constant: 


F-cons tant = 

F*constant: 
Performance: NO-VPs VPs 
Exception Detection Modes: 
Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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fp-algorithms. text dcmrsprint>fB482 A: 7/27/87 17:22:33 page 1 
Instructions: F+constant ,F—constant,F*constant ,F/constant 

Precision: double 

Addr—Mode: one 

Condition: cond 


Algorithm on WTL-3164: 


A op Const—> A 64 bit :conditional 


#cycles MB FB 
52 AQ—|s—>Ta free 
32 A@—ms—>Tb Ta—>Reg-ls 
Oz free Tb—>Reg-ms 
5 qe Const—>Xreg 
S52 | free Reg op Xreg—>Reg 
32 | Ai-|s-—>Td Reg-Is—>Te or Ta-—>Te 
$2 | Tc—>A@-|s Td—>Reg-Is 
52 | A1—ms—>Tc Reg-ms—>Td or 1>->Td 
32 | Td—>A@-ms Tc->Reg-ms 
5 | Const—>Xreg 
32 | free Reg op Xreg—->Reg 
32 | A2-—|s—>Ta Reg-!Is—>Tb or Td—>Tb 
32 | Tb->A@-Is Ta->Reg-ls 
S2 | A2—ms—>Tb Reg-ms—>Ta or Tce->Ta 
S2 = Ta—>A®—ms Tb-—>Reg-—ms 
Ucode Space: Common routine: 420 
F+constant: 15) 
F—constant: ls) 
F*constant: VS 
F/constant: 25 
Performance: NO—VPs VPs 
265 cycles 169 cycles per VP 


+ 96 cycles for overhead 
Bie additional 448 cycles 


Exception Detection Modes: 

Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: F+eonstant ,F—constant,F*constant,F/constant 
Precision: double 

Addr—Mode: one 

Condition: always 


Algorithm on WIL-3164: 


A op Const-—> A 64 bit :always 


#cycles MB FB 

32 A@—|s->Ta free 

oY A®—ms—>Tb Ta-—>Reg—-lIs 

32 free Tb—>Reg-ms 

5 Const—>Bypcss 

32 Ad lS=>i.c Reg op Xreg->Reg 

o2 +———>A1-ms->Td Reg—!s—>Ta 

52. | Ta->A@-ls Tc—>Reg-ls 

2 | A2—|s—>Tc Reg—ms-—>Tb 

32 | Tb—>A@—ms Ta—->Reg—ms 

5 +————— Const—>Bypass 

Ucode Space: Common routine: 278 
F+constant: 13 
F—constant : 1S 


*constant: NS 
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fp-algorithms.text dcm>sprint>f@402 A: 7727787 17:22:33 page 
F/constant: 25 

Performance: NO—VPs VPs 
265 cycles 128 cycles per VP 


+133 cycles for overhead 
fas additional 448 cycles 


Exception Detection Modes: 

Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: F+constant ,F—constant,F*constant ,F/constant 
Precision: single 

Addr—Mode: one 

Condition: cond/always 


Algorithm on WTL—-3164: 


A op Const-> A 32 bit :conditional or :always 


#cycles MB FB 
iz Const—>Bypass free 
1 free Bypass—>Reg[31] 
BYE A®->Ta free 
oe free Ta op Reg[31]—->Reg 
S2 Samal eal ReG->lce Ore iic—>ie 
1 | free Bypass—>Reg[31] 
SZ | Tc->AQ Tb op Reg[31]—>Reg 
32 | A2->Ta Reg—>Td or Tb->Td 
1 | free Bypass—>Reg[31 ] 
Sl +—— Td->Al1 Ta op Reg[31]->Reg 
Ucode Space: Common routine: 208 

F+teonstant: Ve) 

F—constant: Ve) 

F*constant: Ve 

F/constant: 25 
Performance: NO—VPs VPs 

135 cycles 67 cycles per VP 


+ 68 cycles for overhead 
ye additional 192 cycles 


Exception Detection Modes: 

Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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fp-algorithms.text dXcmsprint>f@482 A: 7727787 17:22:33 page 15 __ 
Instructions: F+constant,F—constant ,F*constant 

Precision: single 

Addr—Mode: two 

Condition: always 


Algorithm on WTL—-3132: 


A op Const-—> B 32 bit :always 


#cycles MB FB 
2 Const-—>Bypass free 
1 free Bypass—>Reg[31] 
a2 A®->Ta free 
52 free Ta op Reg[31]->Reg 
32 > elit) Reg->Te or Ta->Te 
1 | free Bypass—>Reg[31] 
32 | Tc->B@ Tb op Reg[31]—>Reg 
32 | A2—>Ta Reg->Td or Tb->Td 
1 | free Bypass—>Reg[31] 
32 +—— Td->B2 To op Reg[31]-—>Reg 
Ucode Space: Common routine: 200 

F-ts 13 

E=2 13 

Fe: le 
Performance: NO—VP s VPs 

lisomeyc lies 67 cycles per VP 


+ 68 cycles for overhead 
Exception Detection Modes: 
Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: F+constant,F-—constant ,F*econstant 
Precision: double 

Addr—Mode: two 

Condition: always 


Algorithm on WTL—3132: 


A op Const-> B 64 bit :always 


#cycles MB FB 
Ucode Space: Common routine: 
F+constant: 


F—constent: 
Fxconstant: 


Performance: NO-VPs VPs 

Exception Detection Modes: 

Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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an 
a) 


Instructions: F+constant,F—constant,F*constant,F/constant 
Precision: double 

Addr—Mode: two 

Condition: always 


Algorithm on WTL-3164: 


A op Const-> B 64 bit :always 


#cycles MB FB 
oe AG-|s—>Ta free 
ays A®@—ms—>Tb Ta—>Reg—-Is 
2 free Tb-—>Reg-—ms 
a Const—>Bypass 
32 AN NS == iG Reg op Xreg—>Reg 
o2 SSS SS Se Reg—|Is—>Ta 
32 | Ta->Be-Is Tc->Reg-ls 
Sy | A2Z=|'S-ic Reg—ms—>Tb 
2. | Tb—>B@-ms Ta-—>Reg—ms 
5 SS Const—>Bypass 
Ucode Space: Common routine: 270 
F+constant: We 
F—constant: V3 
F*constont: is 
F/constant: 25 
Performance: NO-VPs VPs 
265 cycles 128 cyeles per VP 


+133 cycles for overhead 
Bye additional 448 cycles 


Exception Detection Modes: 

Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: F+constant ,F—constant,F*constant,F/constant 
Precision: single 

Addr—Mode: two 

Condition: always 


Algorithm on WTL-3164: 


A op Const—=> B 32 bit <always 


#cycles MB FB 
2 Const—>Bypass free 
1 free Bypass—>Reg[ 31] 
52 A®->Ta ieee 
o2 free Ta op Reg[31]—>Reg 
oe SS Ses Reg=> leon Md—> Tie 
i | free Bypass—>Reg[31] 
SZ | Tce-—>BO Tb op Regi 31]—>Reg 
o2 | A2->Ta Reg—>Td or Tb—->Td 
1 | free Bypass—>Reg[31] 
BZ +———  Td->B2 Ta op Reg[{31]—>Reg 
Ucode Space: Common routine: 200 

F+eonstant: ie 

F—constant: 13 

Feconstant: Ue) 

F/constant: 25 
Performance: NO-VPs VPs 

135 cycles 67 cycles pen VP 


+ 68 cycles for overhead 
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rye additional 192 cycles 


Exception Detection Modes: 

Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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= 


Instructions: F+constant,F—constant,F*constant 
Precision: single . 
Addr—Mode: two 

Condition: cond 


Algorithm on WIL-3132: 


A op Const-> B 32 bit :conditional 


#cycles MB FB 
2 Const—>Bypass free 
1 free Bypass—>Reg[ 31 ] 
ye A@->Ta free 
32 +——>B@->Tb Ta op Reg[31]—>Reg 
52 | A1—>Ta Reg->Td or Tb->Td 
32 | Td->Be@ free 
1 4+———free Bypass—>Reg[ 31 ] 
Ucode Space: Common routine: 135 

F+constant: ie 

F—constant: 13 

F*constant: 13 
Performance: NO-VP s VPs 

135 cycles 10@ cycles per VP 


+ 35 cycles for overhead 
Exception Detection Modes: 
Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: F+constant ,F—constant,F*constant 
Precision: double 

Addr—Mode: two 

Condition: cond 


Algorithm on WTL-3132: 
AvopyConst—> B64 bit =condi tional 
#cycles MB FB 


Ucode Space: Common routine: 

F+tconstant: 

F—constant: 

F*constant: 

F/constant: 
Performance: NO-VPs VPs 
Exception Detection Modes: 
Please refer to the section on exception handling oat the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: F+constant ,F—constant,F*constant ,F/constant 
Precision: double 

Addr—Mode: two 

Condition: cond 


Algorithm on WTL-3164: 


A op Const-> B 64 bit :conditional 


#cycles MB FB 
32 A®—-|s-—>Ta free 
32 A@-—ms—>Tb Ta—>Reg—-!s 
52 B@-|s—>Ta Tb—>Reg—ms 
5 4+——-> Const—>Bypass 
o2 | B@—ms—>Tb Reg op Xreg—>Reg 
OZ | A1—|s—>Td Reg-|s—>Te or Ta->Te 
2 | Tc->BO-Is Td—>Reg—-ls 
32 | A1—ms—>Ta Reg—ms—>Td or Tb->Td 
32 | Td->BO@-ms Ta->Reg—ms 
$2 qs =| SEG: free 
Ucode Space: Common routine: 31@ 
F+tconstant: 15 
F—constant: if) 
F*constant: 15 
F/constant: 25) 
Performance: NO—VP s VPs 
265 cycles 197 cycles per VP 


+ 96 cycles for overhead 
ye additional 448 cycles 


Exception Detection Modes: 

Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: F+constant ,F—constant,F*constant,F/constant 
Precision: single 

Addr—Mode: two 

Condition: cond 


Algorithm on WTL-3164: 


A op Const-—> B 32 bit :conditional 


#cycles MB FB 
2 Const—>Bypass free 
1 free Bypass—>Reg[ 31] 
a2 A@->Ta free 
32 +——->B@->Tb Ta op Reg[31]—>Reg 
32 | A1l—>Ta Reg->Td or Tb->Td 
2 | Td->Be free 
1 +———f ree Bypass—>Reg[ 31] 
Ucode Space: Common routine: 135 

F+constant: iS: 

F—constant: U3 

F*constant: 1S 

F/constant: 25 
Performance: NO-VP s VPs 

135 cycles 108 cycles per VP 


+ 35 cycles for overhead 
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page 20 


Wee additional 192 cycles 


Exception Detection Modes: 
Please refer to the section on exception handling at the 


end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: 
Precision: 
Addr—Mode: 
Condition: 


fr hcnot =), FS, FCS and =), F<, 


single 
two 
cond/always 


Algorithm on WTL-3132: 


<- 
<- 


<- 
<- 


A compare B 32 bit :conditional or :always 
#cycles MB FB 
a2 A®—->Ta free 
BZ ta EOL) Ta—>Reg 
32 | A1l—>Tc Tb compare Reg 
5 | status—>mem free 
D2 | B2—>id Te—>Reg 
52 | A2->Ta Td compare Reg 
5 == SUCK US=Shieur free 
Ucode Space: Common ;outine: 168 

F=: ik) 

Fe iS 

PS: i 

ake 15 

Es 15 

ae 1S 
Performance: NO—VPs VPs 

105 cycles 7® cycles per VP 


Exception Detection Modes: 
Please refer to the section on exception handling at the 


end of this document. 
to every function 


Fit< ands) 


compare status bits go to Sprint 
producing correct test-flag 


compare status bits go to Sprint 
producing correct test—flag 


+ 35 cycles for overhead 


The cost for each mode according 
is described there. 
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The 64 bit compares on WIL—32 will 


faster to do it on the CM using current code. 
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fp-algorithms. text Ocmrsprint>f@4@2 A: 7727/87 17:22:33 page 22 


Instructions: r=, la(inoe =), (ey G2 Cine Sa), (ee, IE incl =) 


Precision: double 
Addr—Mode: two 
Condition: cond/always 


Algorithm on WTL—3164: 


A compore B 64 bit :conditional or :always 
#cycles MB FB 
32 A®—|s—>Ta free 
52 A@—ms—>Tb Ta->Reg-ls 
oz Be-|s—>Td free 
32 Ss OSG Tb->Reg-ms 
52 | Ai-—|s-—>Ta Tc,1Td compare Reg<— compare status bits go to Sprint 
SP | A1—ms—>Tb Tc,Td compare Reg<— compare status bits go to Sprint 
S | status—>mem free <— producing correct test—flag 
32 Bl Si Ta—>Reg—-!s 
Ucode Space: Common routine: 230 
F=: 15 
Fas 1s) 
ree sy 
Fae ks. 
ES ke) 
F& aks) 
Performance: NO-VPs VPs 
208 cycles 135 cyeles per VP 


+ 98 cycles for overhead 
Exception Detection Modes: 
Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: =, Pine =), , FC cic =), F<, F< ancl S) 


Precision: single 
Addr—Mode: two 
Condition: cond/always 


Algorithm on WTL—-3164: 


A compare B 32 bit :conditional or :always 
#cycles MB FB 
32 A@-—>Ta free 
ay) ee Oat Ta-—>Reg 
a2 | Al—>iTc Tb compare Reg <-— compare status bits go to Sprint 
5 | status—>mem free <— producing correct test—flag 
32 | B2—> (Id Tc—>Reg 
52 | A2->Ta Td compare Reg <— compare status bits go to Sprint 
5 ee SIUC UL Sn eM free <-— producing correct test-flag 
Ucode Space: Common routine: 168 
F=: 15 
fas (he) 
Be 15 
me ois) 
ne AS 
Fes he 
Performance: NO—VP s VPs 
LeSz cycles JOscyeles pen VP 


+ 35 cycles for overhead 
Exception Detection Modes: 
Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 


fy 


- @ 


Oy 
2 
=a 


SE agng _ ESeSEStS SVE 9 SBA reegaes 
a ee ee ee es 
atdueb 
Le 
ayowlMAord5 


ee 


eyoeio: 16 | eid fl onas! 16d 89 
e co] 


eS 


Th at<<wi-ty ° 
ei-gazt oe? dt--e@ GA 
és*' oT<<5 1-90 : 
c= pelt" si 4-29 ——— 
at ay @7!4 eulele e\uqees =dgee @ (fq uT ay ol<-g! ota 
view @ sul dls @t6gpon —>gal esoqets 87; 27 di<-0e@-'A 
gel ttaatot} Ger onl oubd qo esi! memeoguiele 
¢t—geh<-aT pi<-el—| 
fo \eciivay weaned :pooge 
al j=. 
ef 3 
er :< 
9 9 
af > 
a] q 
ah a Chet ' 
‘Vv neg @elaysa Zet slaqo GOL 
pnonieve 10) ealoys Be 7s 
,wenoM nol isa fed nel 
#4) fo  oiipaot eal huevew ee abl igeerees al eeted) ” 


yi biases eham deoe- neh tvor et?. ».sgendpan white Ae 
sod! bedi tieeh af nol faaut vas 
6”) 


eee eee ee ee ee ee 
—e 


(oe tue >) S 6@ wog <<)? 4?) f= any et radotls 
signi ¢ 
ow) : 
#yow'to\pnss 
OStL-!TR no. tty 22 he 


ie 


ryowios 16 lonwltiveed: fied SE a ev >A 
/ a ™ ala 


eax 3 
pene-sT nt eoaed 


inisq® ot eg OF)¢ @uO)e stoqme> => geh Syeqres + ee 
je} t=<0g.¢). $025 ee Gf! Gag > aa' : 
js « See 
‘Ave a) of sia evlole Giageee <> ge BrOgR0D.5 2° 
celt-tee! teeite> Gar asutelg -> wert meee ote 


e eal tact 
t 
a 


fp-algorithms.text dcmspr int >f@482 A: C(Lé4E? APtZ2738 page 23 


Instructions: Fmin, Fmax 
Precision: single 
Addr—Mode: two 
Condition: cond 


Algorithm on WTL-3132: 


A max B —->A SZ) bint, seondil Epona 
#cycles MB FB 
a2 A@->Ta free 
352 SSE S18 Ta->Reg 
32 | Ai-—>Tc Tb compare Reg <-— compare status bits go to Sprint 
e) | status—>mem free <— producing correct test—flag 
96 | CM moving max of A and B into A 
52 | B2->Td Te->Reg 
a2 | A2->Ta Td compare Reg <-— compare status bits go to Sprint 
5 | status—>mem free <— producing correct test—flag 
96 +—CM moving max of A and B into A 
Ucode Space: Common routine: 168 
Fmax: 25 
Fmin: 25 
Performance: NO—VPs VPs 
201 cycles 166 cycles per VP 


+ 35 cycles for overhead 
Exception Detection Modes: 


Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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The 64 bit compares on WTL—-32 will not be supported it will be 
faster to do it on the CM using current code. 
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fp-algorithms. text Ycm>spr int 8402 Q: PI LPO, APZLLI3 page 24 


Instructions: Fmin, Fmax 
Precision: double 
Addr—Mode: two 
Condition: cond 


Algorithm on WTL-3164: 


A max B —>A 64 bit :conditional or :always 
f#cycles MB FB 
w2 A@—|s—>Ta free 
mie. A®—ms-—>Tb Ta—>Reg—Is 
32 B@-|s—>Td free 
o2 Oe MSe Tb-—>Reg-—ms 
ae | Ai—|s—>Ta Tc,Td compare Reg<— compare status bits go to Sprint 
OZ | Ai—ms-—>Tb Tc,Td compare Reg<— compare status bits go to Sprint 
5 | status—>mem free <— producing correct test—flag 
192 | CM moving max of A and B into A 
OZ pS | Sie Ta—>Reg-Is 
Ucode Space: Common routine: 230 
Fmax: 25 
Fmin: 25 
Performance: NO—VP s VPs 
392 cycles 327 cycles per VP 


+ 98 cycles for overhead 
Exception Detection Modes: 


Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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his amUiCH hOnSie Fmin, Fmax 
Precision: single 
Addr—Mode: two 
Condition: cond 


Algorithm on WTL-3164: 


A max B —->A SPX faire SeOipkell 1 i eMavel!| 
#cycles MB FB 
32 A®—->Ta iuee 
32 a SBS] Ta—>Reg 
o2 | Al—>Tc Tb compare Reg <- compare status bits go to Sprint 
5 | status—>mem free <- producing correct test-flag 
96 | CM moving max of A and B into A 
Sy | B2->Td Te—>Reg 
BY | A2->Ta Td compare Reg <-— compare status bits go to Sprint 
S | status—>mem igee <- producing correct test-flag 
96 a CM moving max of A and B into A 
Ucode Space: Common routine: 168 
Fmax: 25 
Fmin: 25 
Performance: NO-VP s VPs 
201 cycles 166 cycles per VP 


+ 35 cycles for overhead 
Exception Detection Modes: 
Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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fp-algorithms.text rcmsprint>f@482 A: 7727787 A7:22:c page 25 


Instructions: Fmin, Fmax 
Precision: single 
Addr—Mode: three 
Condition: cond 


Algorithm on WTL-3132: 


A max B —>C GU {olin “ereroralell se ikoinell 
#cycles MB FB 
oy A®->Ta free 
a2 ee Ta->Reg 
32 | AV=>ie Tb compare Reg <-— compare status bits go to Sprint 
5 | staius—>mem free <— producing correct test-—fliag 
128 | CM moving max of A and B into C 
32 | B2->Td Tc—>Reg 
Sys | A2->Ta Td compare Reg <-— compare status bits go to Sprint 
ie) | status—>mem free <— producing» eonrnect vest—f lag 
128 +——CM moving max of A and B into C 
Ucode Space: Common routine: 168 
Fmax: 58 
Fmin: 30 
Performance: NO—VPs VPs 
Zoo) CYC es 198 cycles per VP 


+ 35 cycles for overhead 
Exception Detection Modes: 


Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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The 64 bit compares on WTL-32 will not be supported it will be 
faster to do it on the CM using current code. 
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fp-algorithms. text rcm>sprint>#8482 A: = 7727787 17:22:33 pause 26 


nstructions: Fmin, Fmax 
Precision: double 
Addr—Mode: three 
Condition: cond 


Algorithm on WTL-3164: 


A max B —>C 64 bit :conditional or :always 
#cycles MB FB 
oP A®@—|s—>Ta free 
2 A@—ms-—>Tb Ta—>Reg—Is 
a2 BoO-I|s-—>Td free 
a2 BS ie Tb—>Reg-—ms 
32 | Ai-|s->Ta Te,Td compare Reg<— compare status bits go to Sprint 
32 | Ai-—ms-—>Tb Te,Td compare Reg<— compare status bits go to Sprint 
5 | status—>mem free <-— producing correct test—flag 
256 | CM moving max of A and B into C 
m2 FSS | SSC Ta-—>Reg—Is 
Ucode Space: Common routine: 23@ 
FH: ks) 
Fe i: 
Be als) 
Fe igs) 
Be 15 
ae 15 
Performance: NO—VP s VPs 
456 cycles 3591 cycles per VP 


+ 98 cycles for overhead 
Exception Detection Modes: 
Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: Fmin, Fmax 
Precision: single 
Addr—Mode: three 
Condition: cond 


Algorithm on WTL-3164: 


A max B —>C S2eDiitescondit tonal 
#cycles MB FB 
32 A@—>Ta free 
52 SSE IIo) To—>Reg 
32 | A1l-—>Tec Tb compare Reg <-— compare status bits go to Sprint 
is. | status—>mem free <-— producing correct test-flag 
128 | CM moving max of A and B into C 
BY | B2—>Td Tc—>Reg 
32 | A2-—>Ta Td compare Reg <— compare status bits go to Sprint 
5 | status—>mem free <— producing correct test-flag 
128 +———-CM moving max of A and B into C 
Ucode Space: Common routine: 168 
Fmax: 38 
Fmin: 38 
Performance: NO-—VPs VPs 
ZOONCYCNES 198 cycles per VP 


+ 35 cycles for overhead 
Exception Detection Modes: 
Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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fp-algorithms.text rcmsprint>#¥482 A: = 7727787 17:22:33 page 27 


Instructions: Feos,, Fsinkg tam, (fe x), Clog x) and ete... 
Precision: single 

Addr—Mode: one 

Condition: cond/always 


Algorithm on WTL-3132: 
Third order polynomial evaluation for all of these instructions. 
Poly (A) ->A 32 bit :conditional or :always 


Comment: Constants for various functions have already been 
loaded into the Weitek chip. They are in temp registers 


#cycles MB FB 
32 A®@->Ta free 
32 A1i—>Tb (Ta * C3) + C2->Reg 
32 +—— >free (Ta * Reg) + C1—>Reg 
32 | free (Ta * Reg) + C@->Reg 
BY | free Reg —>Tc 
o2 | Tc -—>A@ (Tb * C3) + C2->Reg 
32 | A2->Ta (Tb * Reg) + C1i—>Reg 
52 | free (Tb * Reg) + C@—->Reg 
32 | free Reg —>Td 
a2 4 Td —>A@ (Ta * C3) + C2—->Reg 
Ucode Space: Common routine: 338 

Fcos: 22 

Fsin: 20 

Ftan: 20 

(i GB Soe 20 

Giioqux) © 20 
Performance: NO-VPs VPs 

200 cycles 135 cycles per VP 


+ 65 cycles for overhead 
Exception Detection Modes: 
Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: Fook, [FStm, TFucm, (i Go), Clog &) Ginch Gwe... 
Precision: double 

Addr—Mode: one 

Condition: cond/always 


Algorithm on WTL—3132: 

Third order polynomial evaluation for all of these instructions. 
Poly (A) —>A 32 bit :conditional or :always 

Comment: This code will have to be implemented in Lisp 


as a subroutine calling the double precision multiply 
and double precision add code. 


Performance: NO-VP s VPs 
(+ (+ 3 Fs) (+ (* 3 Fe) 
(* 3 F+)) (* 3 F+) 


Exception Detection Modes: 

Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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28 


Instructions: BCOS wm Hsin amtanbm 1 e€ x), (log x) and ete... 
Precision: double 

Addr—Mode: one 

Condition: cond/always 


Algorithm on WTL-3164: 
Third order polynomial evaluation for all of these instructions. 


Poly (A) —>A 64 bit :conditional or :always 


Comment: Constants for various functions have already been 
loaded into the Weitek chip. They are in temp registers 
#cycles MB FB 
32 ACSC free 
32 | A@—ms—>Tb free 
64 | free (Ta,Tb * C3) + C2->Reg 
64 | free (Ta,Tb * Reg) + C1->Reg 
64 | free (Ta,Tb * Reg) + CO—->Reg 
32 | free Reg—|s—>Te or Ta->Te 
S2 | Tc->AG-Is Reg-ms—>Td or Tb->Td 
2 CAG INS free 
Ucode Space: Common routine: 40 

Fcos: 20 

Fsuihie 20 

RRane 20 

(genx): 20 

(log ex): 20 
Performance: NO—-VP s VPs 

568 cycles 368 cycles 


Exception Detection Modes: 

Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: COS Sih wmtane (tee x) peGlogex mand ete 
Precision: single 

Addr—Mode: one 

Condition: cond/always 


Algorithm on WTL-3164: 
Third order polynomial evaluation for all of these instructions. 
Poly (A) —>A 32 bit :conditional or :always 


Comment: Constants for various functions have already been 
loaded into the Weitek chip. They ore in temp registers 


#cycles MB FB 

o2 A@->Ta free 

SZ A1—>Tb (Ta * C3) + C2—>Reg 
52 +——>free (Ta * Reg) + C1—>Reg 
32 | free (Ta * Reg) + C@—->Reg 
32 | free Reg —>Tc 

32 | To —>AQ (Te «.C3) + °C2-SReg 
S2 | A2->Ta (Tb * Reg) + Ci—>Reg 
$2 | free (Tb * Reg) + C@—->Reg 
32 | free Reg —>1Td 


32 +————-Td -—>A®@ (Ta * C3) + C2—>Reg 
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Ucode Space: Common routine: 330 
Fcos: 20 
Fsin: 20 
Ftan: 20 
Gr ie SOE 26 
(aliociexs):< 22 
Performance: NO—VP s VPs 
2@0 cycles 135 cycles per VP 


+ 65 cycles for overhead 
Exception Detection Modes: 
Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: Global (F+, F-, Fx) 


Precision: 
Addr—Mode: 
Condition: 


Algorithm 


The first 
Dimension 


Comment: 


single 
one 
cond/always 
on WiiL=S152: 
32 values get collapsed on every chip. 


ZETO. 2. 


This code uses cube swaps to achieve the 


global value 


#cycles MB FB 
52 A@->Ta free 
o2 free Ta->Reg 
55 free Reg op Reg—>Reg[1] 
1 free Reg[@]—>Ta[1] 
1 Ta[1]—>MEM free 
Cres 2)) cube swap along dimension 1 
32 MEM—>Ta[ 1] free 
1 free Ta[1] op Reg[1]—>Reg[2] 
1 free Reg—>Ta[2] 
1 Ta[2]—>MEM free 
(a2 52)) cube swap along dimension 2 
32 MEM->Ta[2] free 
1 free Ta[2] op Reg[2]->Reg[3] 
1 free Reg—>Ta[3] 
1 Ta[3]—>MEM free 
Cree S2)) cube swap along dimension 3 
Ucode Space: Common routine: 450 
Global F+: 30 
Global F-: 30 
Global Fe: 38 
Performance: NO-VPs VPs 
1100 cycles 1000 cycles per VP 


+ 96 cycles for overhead 


Exception Detection Modes: 


Please refer to the section on exception handling at the 


end of this document. The cost for each mode occording 


to every function is described there. 
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Instructions: Global (F+, F-, 
Precis tion: double 
Addr—Mode: one 


Condition: cond/always 


Algorithm on WTL-3164: 
The first 


Dimension zero... 


Fx ) 


64 values get collapsed on every chip. 


Comment: This code uses cube swaps to achieve the 
global value. It will also need for conditional 
global multiply to have zerce source on the FB port. 
#cycles MB FB 

32 A®—-|s->Ta free 

SZ A@—ms—>Tb Ta—>Reg—Is 

2 free Tb->Reg—ms 

5S free Reg op Reg—>Reg[ 1] 
2 free Reg[1]-—>Ta[1],Tb[1] 
2 Ta[1],Tb[1]—>mem free 

(* 2 64) cube swap dimension 7 

64 MEM->Ta[1],Tb[1]free 

2 free Ta[1] op Seine 
2 free Reg[2]->Ta[2], Tb[2] 
2 Ta[2],Tb[2]—->MEMf ree 

(* 2 64) cube swap dimension 2 


routine: 
F+: 
poe 
Fe: 


Common 
Global 
Global 
Global 


Ucode Space: 


NO—VPs 
2200 cycles 


Performance: 


Exception Detection Modes: 


450 
3@ 
38 
3@ 


VPs 
200@ cycles per VP 


+ 96 cycles for overhead 


Please refer to the section on exception handling at the 


end of this document. 
to every function 


The cost 


for each mode according 


is described there. 
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Instructions: Global (F+, F-, 
Precision: single 
Addr—Mode: one 


Condition: cond/always 


Algorithm on WTL-3164: 


Fe) 


The first 32 values get collapsed on every chip. 


Dimension zero... 


Comment: This code uses cube swaps to achieve the 
global value 

#cycles MB FB 

52 A®->Ta free 

by? free Ta—>Reg 

SS free Reg op Reg—>Reg[i] 
1 free Reg[1]-—>Ta[1] 

1 Ta[ 1 ]—>MEM free 

(Cz 2 oP) cube swap along dimension 1 

32 MEM—->To[ 1] free 

1 free Ta[1] op Reg[1]->Reg[2] 
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1 


‘ 
Catos2) 
32 

1 

1 


: 
(eee 32) 


Ucode Space: 


Performance: 


hnee 

Ta[2]—>MEM 

cube swap along 
MEM->Ta[2] 

free 

free 

To[3]—>MEM 


cube swap along 


Common routine: 
Global F-+: 
Global F-: 
Global Fe: 


NO-VPs 
1108 cycles 


Exception Detection Modes: 
Please refer to the section on exception handling at the 
end of this document. The cost 
to every function is described there. 


Reg—>Ta[2] 

free 

dimension 2 

free 

Ta[2] op Reg[2]—>Reg[3] 
Reg—>Ta[3] 

free 


dimension 3 


450 
5® 
36 
3@ 


VPs 
1008 cycles per VP 
+ 96 cycles for overhead 


for each mode according 
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Instructions: Be Prcenstant 
Precision: single 
Addr—Mode: one, two, three 
Condition: cond/always 


Algorithm on WTL-3132: 
Uses Newton—Raphson method to obtcin a/b 
A divide B —> A 32 bit :conditional or always 


register<— seed 1/b 
(let* ((first-iteration (* register (— 2 (* source register)))) 
(second—iteration (* first-iteration (— 2 (* source first—iteration)))))) 


The pipelined flowchart for this algorithm is hard to desribe on paper. 
For those that are really interested please refer to the actual microcode. 


Ucode Space: Comn.on routine: 291 
Ba 13 
F/constant: 1S 
Performance: NO-VP s VPs 
F/(two): 30@ cycles 256 cycles per VP 
+ 32 cycles for overhead 
F/(three): 300 cycles 256 cycles per VP 


+ 32 cycles for overhead 


F/constant(two): 260 cycles 64 cycles per VP 
256 cycles for overhead 


F/constant(three): 268 cycles 96 cycles per VP 
256 cycles for overhead 


Exception Detection Modes: 

Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Doing a double precision divide on WTL-3132 can be done in two different 
ways. First the way we are doing in now (performance 7@ MFLOPS on Beta), or 
using the double precision multiply written for the WTL-3132 and using 


table-lookup capabilities of the Sprint chip. This performance will be equal 
to the following: 


1. Look-up a 16 bit value from a common table (performance of 16 bit aref) 
A table would have to be 64k bits for this. A smaller table could be used, 
but that would requeire longer time to compute the result. 


2. do 5 multiplies and 2 subtracts (on double precision data) 


Performance: (+ (16 bit oref) + (* 5 Fe) + (* 2 F-)) 
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Instructions: Dot Product, Outer Product 
Precision: single 

Addr—Mode: three 

Condition: cond/always 


Algorithm on WTL—-3132: 


Comment : 

dot <— sum fa ina ee 

outer <— »* [a(i)+b(i)] 

a(i) and b(i) reside in the same processor... 


AopB->A 32 bit :conditional or :always 
#cycles MB FB 
SZ A®->Ta free 
32 B@->Tb To—>Reg 
OZ Al->Ta Tb op Reg—>Reg 
eye) By pass—>MEM Reg->Bypass 
32 + >Bi-—>Tb Ta—>Reg 
32 | A2—>Ta Tb op Reg—->Reg 
og | MEM->Bypass Bypass op Reg—>Reg 
ee) +————_By pass—>MEM Reg->Bypass 
aD free Reg->Ta 
OZ Ta->Dot free 
Ucode Space: Common routine: 350 
Dote Product: 163 
Outer Product 13 
Performance: 128 cycles per element for condition or always 


+192 cycles overhead for always 
+224 cycles overhead for conditional 


Exception Detection Modes: 

Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Instructions: Dot Product, Outer Product 
Precision: double 

Addr—Mode: three 

Condition: cond/always 


Algorithm on WTL—-3132: 


Comment: 

dot <-— sum eee 

outer <— * [a(i)+b(i) 

a(i) and b(i) reside in the same processor... 


This will have to be implemted in Lisp as a simple 
loop that does the above equation. Performance is 
based on the double precision F* and double precision 
Par 
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POLE? AP 72s BR 


Instructions: 
Precision: 
Addr—Mode: 
Condition: 


Algorithm 


Comment: 


dot <— sum 
<-— * 


outer 


Dot Product, 
double 
three 
cond/always 


on WTIL-3164: 


mane 


Outer Product 


a(i) ond b(i) reside in the same processor... 


A op B => 
#cycles 


32 
32 
52 
352 


Ucode Space: 


Performance: 


A 32 bit :conditional or :always 
MB 
A®—|s—>Ta free 
A®—ms—>Tb Ta—>Reg-Is 
B@-Is—>Ta Tb->Reg—ms 
Be-ms—>Tb free 
A= iS—=—-aicC Ta,Tb op Reg—->Reg 
Ai-—ms->Td Ta,Tb op Reg—>Reg 
+———>By pass—>Mem Reg—>Bypass 
| Bi-|ls—>Ta Td—>Reg-—ms 
| B1i—ms—>Tb Tc—>Reg-ms 
A1—|s—>Tc Ta,Tb op Reg—>Reg 
| Ai—ms—>Td Ta,Tb op Reg—>Reg 


+———Mem->By pass 


free 
Ta->Dot—Is 
Tb-—>Dot—ms 


Common routine: 
Dot Product: 
Outer Product 


258 


Bypass op Reg->Reg 


Reg—Is—>Ta 
Reg-ms—>Tb 
free 


450 
13 
13 


cycles per element for condition or always 


page 35 


+288 
+320 


Exception Detection 
Please refer to the 
end of this document 
to every function is 


KEES EEE 


Instructions: Dot 

Precision: sing 
Addr—Mode: thre 
Condition: cond 
Algorithm on WTL-316 
Comment: 

dot <— sum [a(i)*b(i 
outer <— * [a(i)+b(i 


a(i) and b(i) reside 


AopB—->A 32 b 
#cycles MB 

32 A@-> 
bY2 Be-> 
By Ai-—> 
55 Bypa 


cycles overhead for always 
cycles overhead for conditional 


Modes: 

section on exception handling at the 
. The cost for each mode according 
described there. 
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Product, Outer Product 
le 
e 


Jalways 


4: 


i 


in the some processor... 


it :conditional or :always 
FB 

To free 

Tb Ta->Reg 

Ta Tb op Reg—>Reg 

ss—>MEM Reg—>Bypass 
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32 {= Sal Sie MeL Ta—>Reg 
SZ | A2—>Ta Tb op Reg->Reg 
ay | MEM->Bypass Bypass op Reg->Reg 
BS +———_ By pass—>MEM Reg—>Bypass 
SZ free Reg->ia 
32 Ta—>Dot free 
Ucode Space: Common routine: 350 
Dot Product: WS. 
Outer Product he: 


Performance: 


128 cycles per element for condition or always 


+192 cycles overhead for always 
+224 cycles overhead for conditional 


Exception Detection Modes: 

Please refer to the section on exception handling at the 
end of this document. The cost for each mode according 
to every function is described there. 
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Exception Modes: 
Tnere are 4 exception modes: 


Record certain (overflow) fatal errors 
Trap on certain (overflow) fatal errors 
Record any IEEE exception 

Trap on any IEEE exception 


oN — 


ime Recond Fatal) Errors: 


We will support few fatal errors: 


OVERFLOW 
DIVIDE BY ZERO 


The implementation is aos follows. 


Algorithm for Fatal Error Recording on WTL-3132: 


The chip provides the capabilities to detect overflow 
and we will use that. 


OVERFLOW: 


The overflow detection is enabled and the status of that 

comes out on the FPEX pin. The sprint chip will cache that 

bit into its status transposer and at the end of the VP execution 
the bit in the status transposer will be swapped out and ORed 
with the VP bit for the overflow exception. 


Time to detect and set VP fliag: 4 cycles 

This algorithm applies to all instructions with an exception 

of polynomial evaluation instructions. They are handled somewhat 
differently. After every iteration the bits of the status transposer 
are swapped out and ORed with VP flag in memory. 

Time to detect and set VP flag: (* number of iterations 4) cycles 


DIVIDE BY ZERO: 


This needs to be detected only for the divide operation. 


The microcode will scan the exponent of the divisor to 

see if it’s zero. If so then the microcode will set the VP 
divide by zero flag and will turn that (those) processor(s) 
off and will continue with the execution for other processors. 


Time to detect and set Vp flag: 20 cycles 


Algorithm for Fatal Error Recording on WIL-3164: 


The chip provides the capabilities to detect overflow 

and we will use that. In this case we might want to allow 
some other set of fatal errors, when this chip is used 

in the system since it has a much bigger detection 
capabilities. 


OVERFLOW: 


The overflow detection is enabled and the status of that 

comes out on the status pins. The sprint chip will cache that 
bit into its status transpeser and at the end of the VP execution 
the bit in the status transposer will be swapped out and ORed 
with the VP bit for the overflow exception. 


Time to detect and set VP flag: 4 cycles 


This algorithm applies to all instructions with an exception 
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of polynomial evaluation instructions. They are handled somewhat 
differently. After every iteration the bits of the status transposer 
are swapped out and ORed with VP flag in memory. 


Time to detect and set VP flag: (* number of iterations 4) cycles 


PIV DE Big ZERO: 
This needs to be detected only for the divide operation. 


The divide by zero detection is enabled and the status of that 


comes out on the status pins. The sprint chip will cache that 
bit into its status transposer and at the end of the VP execution 
the bit in the status transposer will be swapped out and ORed 


with the VP bit for the divide by zero exception. 


Time to detect and set VP flag: 4 cycles 
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2. Trap and Record Fatal Errors: 


We will support traps on the following fatal errors: 


OVERFLOW 
DIVIDE BY ZERO 


The implementation is as follows. 


Algorithm for Trap on Fatal Error and Recording on WIL-3132 and WTL-3164: 


This applies to both chips and just to the OVERFLOW on WTL-3132 
and to the OVERFLOW and DIVIDE BY ZERO on WTL-3164: 


If traps are enabled by the user the following happens. 


1. the status transposer is swapped out to memory 
(exception bit is swapped to memory) 


2. Then exception bit is is fed through the Global OR by the CM chips 
3. If the global asserted we enter the Fatal Errors 
trap handler otherwise go to next VP. If no more VPs 
then we are done. 
Trap handler: 
The exception bit is in memory. 
The result that caused that exception is still in the 


Weitek chip. 


1. Move the result out of the Weitek Reg-file into 
memory. 


2. Select the processors that had an exception 
em:context—flag <— cm:context—flag and exception-—-flag 


3. Deliver the proper value into their location based 
on the context-flag (cm chip does that). 


4. Turn on the processors that had no exception, 
deliver the correct result into their destination. 


5. Or the exception bit with VP exception flag. 

6. Exit trap handler. Signal to the user about the error. 

Time to detect and enter a Trap on Fatal Errors: 6 cycles. 

The divide by zero error on WTL—3132 is handled by a diffrent trap 


hand!er that does involve the WTL-3132 and is handled totally in 
software. 
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5. Record any IEEE exception: 


We will support all five JEEE exceptions with a certain cost associated 
with each one. 


IVALID OPERATION 
OVERF LOW 

DIVIDE BY ZERO 
UNDERF LOW 
INEXACT 


The implementation is as follows. 


Algorithm for Recording IEEE exceptions on WTL—3132: 


This applies only to single precision, since double 


precision will be done in software with some assistance 
from hadrware ond there will be no problems on detecting 
any exception or all exceptions. 


IVALID OPERATION: 


Both operands will be scaned first before an 
operation proceeds if an invalid operation is found 
a VP flag will be set. This is relatively expensive 


operation since we might have to do few scan over the 
numbers to determine the correct invalid operation. 


Time to detect this exception: Varies with an operation 
the best would be one scan per exponent of each operand. 
Hence the best average time would 36 cycles to set a VP flag. 


OVERFLOW: 


The overflow detection is enabled and the status of that 

comes out on the FPEX pin. The sprint chip will cache that 

bit into its status transposer and at the end of the VP execution 
the bit in the status transposer will be swapped out and ORed 
with the VP bit for the overflow exception. 


Time to detect and set VP flag: 4 cycles 

This algorithm applies to all instructions with an exception 

of polynomial evaluation instructions. They are handled somewhat 
differently. After every iteration the bits of the status transposer 
are swapped out and ORed with VP flag in memory. 

Time to detect and set VP flag: (* number of iterations 4) cycles 


DIVIDE BY ZERO: 


This needs to be detected only for the divide operation. 


The microcode will scan the exponent of the divisor to 

see if it’s zero. If so then the microcode will set the VP 
divide by zero flag and will turn that (those) processor(s) 
off and will continue with execution for other processors. 


Time to detect and set Vp flag: 20 cycles 
UNDERF LOW: 


The underflow detection is enabled and the status of that 

comes out on the FPEX pin. The sprint chip will cache that 

bit into its status transposer and at the end of the VP execution 
the bit in the status transposer will be swapped out and ORed 
with the VP bit for the underflow exception. 


Time to detect and set VP flag: 4 cycles 


~ 
_ agen SEsSSUSY SENSSNS te SGre aoe | 


aie inte or 


wisi rgess teow elepoeo @ Wiiw ened bene® Rag) aypit) tie 


oT 
arr 7 


%y 


’ 
A+ - 


ewetiot £0 4) nal bo tem 
L2tL~LTW we enol gogo TR gnltyeont — 


oe 


vidugd sotia ,Aerel algae cia ee 

*e) etelses sane Aire ev eelion B) Ga wt titer 
ga toeta® «9 preeideiqg Ge o@ Tite qieqt S004 | 
: ene Dapees (he to Merny 


E won reas: 


eo #eioted -Jet!* &enmose 22 11h eba 

boust af poiloteqe Dilewn) tm 86 whessete 
evianeges glowl roles ef e/a) , fhe wa ie oe 
ead veve nose wel ot a) @xot Uiglm oe sonle fey 
ne! fevaee gl lown!l eaeeren ent enleréies = 


hy ‘ 


aolletme wo dhie eeltel widgesxa etal fetes 
L-e:e4go done fo Memqre 200 Ob92 Gwe et biage Ff 
go) * «& Tew a) anieep GE Bigow amit epuvere teed" 


, ‘ 


) a 
fees te sui07e at hee Celdete Oi nel teeled enh Tseeny 
‘oft wtoo> Ii(e gifts falede @4f ont eat atl o 
woitevess @V of? toa ban eA? do Ont padeqent? ute? ua 
hy? deo tue begqows wf bile cavagesDy? euieta- a do 


Haitqunet waltiaes aft@ yo) £0 SV ' ui 
anivas ® \ypiT 0 200 ame pedal 


- 


agi iqrma ap whi ae te 4f det 4 4h 5 
teMwerst te inet ave gee? . 7 en) Se } myo u 
secon)? aulela oT js abia 46 ere oe 
Y10eam As 


evieg? (Fe ancitovedi *s yoo «) Tr av deal. 


_ 


o) veelyed 

W oft jag ioe 
sd i aes re 
#09 ge04sq 1etle 


i aie enh 
ne tnreao gt ty 


fpr Ret A 


fp-algorithms. text rcmsprint>fB482 A: 7727787 17:22:33 page 


4} 


If both the underflow and overflow are enabled then we can not 
differentiate which one exactly happend overflow or underflow. 

Since the chip provides us with an OR of both. One solution 

could be to give user just that (makes us not fully IEEE). 

Second re-execute the instruction with one exception turned on 

and the other off, so that we can correctly determine ihe 

proper exception. Third is to do one operation at a time and 

keep reading the status register every time an instruction executes. 
This lengthens the operations loop by (* 32 4) cycles for all 
instructions. 


This algorithm applies to all instructions with an exception 

of polynomial evaluation instructions. They are handled somewhat 
differently. After every iteration the bits of the status transposer 
are swapped out and ORed with VP flag in memory. 


Time to detect and sei VP flog: (* number of iterations 4) cycles 

If both overflow and underflow are enabled then the time increases by 
(* number of iterations (* 32 4)) 

INEXACT : 


Can not be done using the chip directly. One can uses the chip though 
just to work on portion of the significand and then uses the software 
to produce the correct result and the proper inexact exception. 


Cost is very significand, it depends on the time to do a variable 
precision multiply or add using the chip. The times for those are 
described further on. 
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Algorithm for Recording IEEE exceptions on WTL—3164: 


This applies to both precisions single or double precision. 


IVALID OPERATION: 


The invalid operation detection is enabled and the status of that 


comes out on the status pins. The sprint chip will cache that 
bit into its status transposer and at the end of the VP execution 
the bit in the status transposer will be swapped out and ORed 


with the VP bit for the overflow exception. 
Time to detect and set VP flag: 4 cycles 


If the design of the Weitek chip does not provide us with the status 
for this operation, then the algorithm from WTL-3132 can be applied: 


Both operands will be scaned first before an 
operation proceeds if an invalid operation is found 
a VP flag will be set. This is relatively expensive 


operation since we might have to do few scan over the 
numbers to determine the correct invalid operation. 


Time to detect this exception: Varies with an operation 
the best would be one scan per exponent of each operand. 
Hence the best average time would 36 cycles to set a VP flag. 


OVERFLOW: 


The overflow detection is enabled and the status of that 

comes out on the status pins. The sprint chip will cache that 
bit into its status transposer and at the end of the VP execution 
the bit in the status transposer will be swapped out and ORed 
with the VP bit for the overflow exception. 


Time to detect and set VP flag: 4 cycles 

This algorithm applies to all instructions with an exception 

of polynomial evaluation instructions. They are handled somewhat 
differently. After every iteration the bits of the status transposer 
are swapped out and ORed with VP flag in memory. 

Time to detect and set VP flag: (* number of iterations 4) cycles 


DIVIDE BY ZERO: 


The divide by zero detection is enabled and the status of thot 


comes out on the status pins. The sprint chip will cache that 
bit into its status transposer and at the end of the VP execution 
the bit in the status transposer will be swapped out and ORed 


with the VP bit for the divide by zero exception. 
Time to detect and set VP flag: 4 cycles 


UNDERF LOW: 


The underflow detection is enabled and the stotus of that 

comes out ton he status pins. The sprint chip will cache that 
bit into its status transposer and at the end of the VP execution 
the bit in the status transposer wil! be swapped out and ORed 
with the VP bit for the underflow exception. 


Time to detect and set VP flag: 4 cycles 


If the design of the Weitek chip does not provide us with the status 
for this operation, then the algorithm from WTL-3132 can be applied: 


To do one operation at a time and keep reading the status register every 
time an instruction executes. This lengthens the operations loop by (* 32 4) 
cycles for single precision operations and by (* 64 4) cycles for double precision 
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operations. 


This algorithm applies to all instructions with an exception 

of polynomial evaluation instructions. They are handled somewhat 
differently. After every iteration the bits of the status transposer 
are swapped out and ORed with VP flag in memory. 


Time to detect and set VP flag: (* number of iterations 4) cycles 


If the design of the Weitek chip does not provide us with the status 
for this operation, then the algorithm from WTL-3132 can be applied and 
the time proportionally increase to: 


(* number of iterations (* 32 4)) for single precision 
(* number of iterations (* 64 4)) for double precision 


INEXACT : 


The inexact detection is enabled and the status of that 

comes out on the status pins. The sprint chip will cache that 
bit into its status transposer and at the end of the VP execution 
the bit in the status transposer will be swapped out and ORed 
with the VP bit for the underflow exception. 


Time to detect and set VP flag: 4 cycles 


If the design of the Weitek chip does not provide us with the status 
for this operation, then the following algorithm can be applied: 


To do one operation at a time and keep reading the status register every 
time an instruction executes. This lengthens the operations loop by (* 32 4) 


cycles for single precision operations and by (* 64 4) cycles for double precision 
operations. 
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2. Trap and Record JEEE exceptions: 


We will support traps on the all IEEE exceptions: 


IVALID OPERATION 
OVERFLOW 

DIVIDE BY ZERO 
UNDERF LOW 
INEXACT 


The implementation is as follows. 


Algorithm for Trap on IEEE exceptions and Recording on WIL-3132 and WTL—3164: 


This opplies to both chips. 
If traps are enabled by the user the following happens. 


1. the status transposer is swapped out to memory 
(exception bit is swapped to memory) 


2. Then exception bit is is fed through the Global OR by the CM chips 
3. If the global asserted we enter the IEEE exceptions 
trap handler otherwise go to next VP. If no more VPs 
then we are done. The trap handler differs from one chip 
to the other. If an exception can be detected in hardware 
then the trap handler with hardware assistance is entered 
otherwise a trap handler for the software assistance is entered. 
Trap handler with hardware assistance: 
The exception bit is in memory. 
The result that caused that exception is still in the 
Weitek chip. 


1. Move the result out of the Weitek Reg-file into 
memory. 


2. Select the processors that had an exception 
em:context-flag <— cm:context-—flag and exception—-flag 


3. Deliver the proper value into their location based 
on the context-flag (cm chip does that). 


4. Turn on the processors that had no exception, 
deliver the correct result into their destination. 


5. Or the exception bit with VP exception flag. 

6. Exit trap handler. Signal to the user about the error. 
Time to detect and enter a Trap on IEEE exception: 6 cycles. 
Trap handler with software assistance: 

The exception bit is in memory. 

The result that caused that exception is still being 


manupulated by the software(microcode). 


1. Select the processors that had an exception 
em:context-flag <— cm:context—flag and exception-flag 


3. Deliver the proper value into their location based 
on the context-flag (cm chip does that). 


4. Turn on the processors that had no exception, 
deliver the correct result into their destination. 


5. Or the exception bit with VP exception flag. 
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6. Exit trap handler. Signal to the user about the error. 


Time to detect and enter a Trap on IEEE exception: 6 cycles 
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Doing double precision math on WTL-3132 


The following are considered: F*, F+ and F-, F/ 


Algorithm for implemeting double precision F* using WTL—3132: 


Since the multiplier on the WTL-3132 is only 24 bits wide and 

the result that we can obtain from it is 23 bits, then we can 

only multiply 11 bits at o time. The significand 53 bits gets 
partitioned into chunks of 11 (almost 5 chunks) 


We are doing A*B—>A 


@. some setup time to load upper portion of the word with correct 
value for multiplication, takes 1@ cycles. 


1. first load (B) values into :transposer—a the bits from <22:11> 
are the valid bits that we will multiply together. (11 cycles) 


2. load another batch of (A) values into :transposer—b the bits 
from <22:11> are the valid bits that we will multiply together. 
(11 cycles) 

3. Multiply, takes 32 cycles. 


4. unload result into :transposer—c. Also load next 11 other bits of (A) 
into :transposer-a. (takes 32 cycles) 


5. Store first 22 bits of the first partial product into memory. (takes 22 cycles) 
6. Do the add the other partial product (* 3 22 cycles) 

7. go back to step 3 for 4 more iterations 

8. go back to step 1 for 4 more iterations 

Time to just multiply the significands: 3750 cycles 

Time to add the exponents: 33 cycles 

Hence total time to do F* = 3800 cycles or ~14@ MFLOPS 


This time does not include any overhead for exception handling. 
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fp-algorithms. text Dcmrsprint>f8482 A: 7727787 17:22:33 page 47 


Algorithm for implemeting double precision F+ ond F— using WTL-3132: 


It fs not worth using the WIL-3132 to do F+ and F—. The floating 

point addition algorithm requiers a shift left or right. Since 

we can only obtain shift left by using the chip and only for 

certain length at a time, but we also have to know how many 

leading zeros the significand had so that the epxonent can be adjusted 
accordingly. Since we would have to determine that anyway and we can not 
use the chip to help us with that, we might us well do the shift as we scan 
for leading zeros. Meaning the shift will be done on the CM. 


Time for double precision F+ and F— = 390@ cycles or ~14@ MFLOPS 


This time does not include any overhead for exception handling. 


Algorithm for implemeting double precision F/ using WIL—-3132: 


We can aplly two algoritms to this problem. One is Newton—Raphson method 

which is a multiplicative method it requeres with 2 multiplies and one subtract 
per iteration. The number of iterations depends on how long the initial seed is. 
One iterations takes 11408 cycles. To do a double precision F/ it will take 

(* 7 11400) cycles or a total time of: 


Time for double precision F+ and F— = 880000 cycles or ~6 MFLOPS 


If one uses the current divide algorithm that time takes. This uses the 
CM chip and not the Weitek. 


Time for double precision F+ and F— = 8008 cycles or ~60 MFLOPS 


This time does not include any overhead for exception handling. 
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